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Abstract 

Statistical data assimilation systems require the specification of forecast and observation 
error statistics. Forecast error is due to model imperfections and differences between the ini- 
tial condition and the actual state of the atmosphere. Practical four-dimensional variational 
(4D-Var) methods try to fit the forecast state to the observations and assume that the model 
error is negligible. Here with a number of simplifying assumption, a framework is developed 
for isolating the model error given the forecast error at two lead-times. Two definitions are 
proposed for the Talagrand ratio r, the fraction of the forecast error due to model error rather 
than initial condition error. Data from the CPTEC Eta Model running operationally over South 
America are used to calculate forecast error statistics and lower bounds for r. 


1 Introduction 


Data assimilation systems combine satellite data and other measurements with a first guess coming 
from a predictive model to produce an analysis or estimate of the state of the atmosphere. This 
estimate can be used as an initial condition for numerical weather prediction, or a sequence of 
estimates can be used to study Earth Science phenomena. In statistical data assimilation methods 
the analysis is a weighted average of the model forecast and current observations. The weighting 
is determined by the specification of forecast and observation error statistics. For instance, where 
forecast error is large, more weight is given to observations. Forecast error statistics also determine 
how observations correct forecast errors in a neighborhood of the observation. 
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Figure 1: Contour plots at 500 mb of the forecast height error bias at (a) 24 hours (b) 48 hours and 
of the forecast height error standard deviation at (c) 24 hours and at (d) 48 hours; units are meters. 


and 12 UTC. The CPTEC Eta Model forecast error is approximated by the difference of the fore- 
cast and the NCEP operational analysis. We note that quality of the 0Z and 12Z NCEP analyses is 
likely different since the number of observations is greater at 12Z. The NCEP analysis, interpolated 
onto the Eta grid, is also the initial condition for the operational CPTEC Eta model. We calculate 
forecast error statistics for the period during August of 1998 using a subset of the complete set 
of predicted variables, namely the height fields at three levels 300, 500 and 700 mb on the region 
85W to 30W and 45S to 10S interpolated onto a 0.4° x 0.4° grid. The forecast error mean and 
standard deviation are shown at the 500 mb level in Fig. 1. 

Much of the systematic difference between forecast and analysis is related to the spectral to- 
pography representation of the analysis. The spectral topography of the NCEP analysis is very 
different from the mountain-step coordinate representation of the Eta model near the Andes. The 
resulting NCEP analysis fields when interpolated onto the Eta model grid are incompatable with 
the Eta model topography. Features in the western part of the domain are independent of the 
forecast lead-time while systematic differences between Eta forecast and NCEP analysis over the 
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2 Methodology 

In the context of linear dynamics, we can obtain the model error covariance given the forecast 
error covariances at two lead-times, for instance given the 24-hour and 48-hour forecast error 
covariances. Define forecast error e k - k+ i to be the error of the forecast starting at time k and valid 
at time k + 1. We use time-steps of 24-hours. The 24-hour forecast error e k]k+ \ is the difference at 
time k + 1 of the 24-hour forecast and the true state w k+l , 

€fc;fc+l = Mfc ; A:+lWfc — W l k+1 J (1) 

w% is the analysis at time k and M *.;*+! is the linear operator that advances the state from time k to 
time k + 1. Likewise, define the 48-hour forecast error e fc;fc+2 to be the difference of the 48-hour 
forecast and the true state w k+2 at time k + 2, 

tk-,k +2 = M k . k+2 w l - w l k+2 . (2) 

The 48-hour forecast error is due to the propagated 24-hour forecast error and the model error 
as shown by 

£k-,k + 2 — M fc; fc +2 w k — w k+ 2 = Mfc + i ; fc + 2(Mfc;/- + iU>£ — w[ + i) — 4+2 
= M*+l;fc+2eA:;fc+l — 4 +2 • 

The model error e k+1 is defined by 

€ fc + 1 = W k + l — + . (4) 

The 24-hour and 48-hour forecast error covariances Pfc ; fc+i and Pfc ; * + 2 satisfy 

Pfc ;*+2 = ( e *:;fc+24|fc+2) = M fc+l;fc+2Pfc;fc+lM^ +1 . fc+2 + Q fc+ i , (5) 

where we have taken ( 4 ( 4 ) r ) = Q*; we use the notation (•) to denote ensemble average. There- 
fore, the model error covariance Q*. +1 can be obtained given the dynamics M fc+ i : ^ + 2 and the fore- 
cast error covariances Pfc ; jfc+i and P*;fc+ 2 . 

There are difficulties with the method presented above. First, the true state, and hence the 
forecast error, is unknown. Therefore, we make the approximation of replacing the true state by 
the verifying analysis, i.e., instead of (1) we take 

tk'M 1 = M *;fc+1< - Wfc+1 ■ ( 6 ) 
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Another lower bound for r var depends on the variance Ai(P 24 ) associated with the leading 
eigenvector of the 24 hour forecast error covariance and the quantity tr MM T , namely: 

r V ar>l- V^tr(MM T ). (10) 

tr r 48 

This lower bound reflects a worse-case situation where spectrum of the 24 hour forecast error 
covariance is flat. Recall that 

n 

trMM r = ^CT t 2 (M), (11) 

i = 1 

and is the expected amplification of the variance of an uncorrelated, homogeneous random initial 
condition (Tippett 1999). Therefore, given the singular values of M and the eigenvalues of the 24 
and 48 hour forecast error covariance, lower bounds for r var can be obtained. 

Another measure of the model error is the volume Talagrand ratio r vol defined as (Schneider & 
Griffies 1999) 


n is the dimension of the state-space. This definition has the following geometric interpretation. 
For a Gaussian random variable with zero mean and covariance P, the ellipsoid £ p (P) that encloses 
some fraction 0 < p < 1 of the cumulative probability distribution has a volume proportional to 
(det P) 1/2 . Therefore, the Talagrand ratio r vo i is the square of the geometric mean of the semiaxis 
lengths of the model error ellipsoid £ P (Q) over the square of the geometric mean of the semiaxis 
lengths of the 48-hour forecast error ellipsoid £ P (Q). We note that this definition is invariant under 
general nonsingular transformations of the state-space. As a consequence, r v0 1 does not depend on 
the choice of inner product. A lower bound for r vo i is 

r-> !-(<*. M)»£|£) , 'V (13) 

Special care in calculating r vo i must be taken when the covariance or dynamics matrices are sin- 
gular. The simplest remedy and the one we use here is to compute the determinants on a reduced 
spaces where the matrices are nonsingular. 

In Fig. 3 we plot lower bounds for r va r and r vo i as function of the singular values of the dynamics 
for the approximate forecast error covariance matrices calculated here for the Eta model. If the 
model dynamics does not amplify the 24-hour forecast error, the 48-hour forecast error is due 
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Figure 4: The standard deviations of the (a) 24-hour and (b) 48-hour forecast anomaly height fields 
plotted at 500 mb. Units are meters 


modes associated with small eigenvalues are likely to suffer from sampling error. On the other 
hand, keeping too few modes produces dynamics that does not produce growth. Here we choose to 
retain 5 modes of P{ A explaining 85% of its total variance. Later we show how the lower bounds 
for r depend on the number of modes retained. 

Figure 4 shows the standard deviation of the 24 and 48 hour forecast height anomaly fields. 
As expected there are only slight differences between the standard deviation of the 24 and 48 hour 
height anomaly fields; both should approximate the natural variability of the height field during this 
period. The variability of the model may be different from the model error. For instance, model 
error may be a results of the model not presenting the same variability seen in nature. However, 
there are some similarities between the two here. Model error is much noisier. Calculation of the 
principle angles show that the leading eigenmodes of P{ A ar| d P{g span approximately the same 
subspaces. 

Figure 5(a) shows the singular values of M. Although the dynamics is stable by construction 
with all eigenvalues inside the unit circle, there are singular values greater than one, indicating 
nonmodal growth. Figure 5(b) shows the extent to which the Markov model dynamics is able 
to propagate the 24 hour forecast height anomaly. The deterministic part of the signal is not 
substantially larger than the random component. The covariance of the residual ( bb T ) is computed 
in the full-space and is larger than the truncated part of the anomaly covariances. 

Figure 6 shows the leading right and left singular vectors of M. Their structure is related to the 
entrance of fronts and cyclogenesis in the Atlantic; 9 fronts passed through the region during the 
period. Using the singular values of the Markov model M in (9) and (10) gives as lower bound for 
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Figure 7: The lower bounds in Eq. (9) (solid line), Eq. (10) (dashed line) and Eq. (13) (dotted 
dashed line) as function of the number k of modes used in the Markov model calculation. 

the Talagrand ratio r var > 0.37 and r var > 0.76 respectively; (13) gives r vo i > 0.70 where we have 
conservatively used only 3 modes to compute determinants. Using more modes in the calculation 
of the Markov dynamics increases the singular values and decreases the lower bound as shown in 
Fig. 7. Of the three lower bounds, (9) is the most sensitive to the number of modes retained since 
it depends only on the size of the largest singular value cti(M) of M. 

The lower bounds for r come from assuming that the 24-hour errors project favorably onto 
the growing modes of the dynamics. However, the dimension of both the subspace of dominant 
forecast errors and of the subspace of dynamically growing modes, around 10, is small compared 
to the dimension of the full space 36,990. The likelihood that two arbitrarily chosen subspaces 
intersect is therefore small. We first compare the subspaces spanned by the forecast errors and by 
the forecast anomalies by comparing their principle angles (see Appendix). Figure 8 shows that 
there is reasonable correspondence between the subspaces spanned by the forecast error and the 
forecast anomaly. The correspondence is less at 48-hours than at 24 hours. At both lead times there 
are principle angles of about 80° indicating that there are forecast errors that project very weakly 
onto the subspace spanned by the forecast anomaly. 

For the forecast anomaly dynamics to be able to amplify efficiently 24-hour forecast errors into 
48-hour forecast errors, there must be a favorable relationship between the leading subspaces of the 
error covariance and the singular vectors of the dynamics. Namely, the 24-hour forecast errors must 
project onto the leading right singular vectors of the dynamics and the 48-hour forecast errors must 
project onto the left singular vectors of the dynamics. In Fig. 9 we compare the subspaces spanned 
by the errors and by the singular vectors of the Markov model. In Fig. 9(a) we observed that the 
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Figure 10: As in Fig. 2 with the addition of the eigenvalues of MP 2 4 M T (solid line with x’s). 



Figure 11: Contour plots of the standard deviation of (a) Q and (b) MP 24 M 7 shown at 500 mb. 
Units are meters. 


Figure 1 1 where the standard deviation of Q and MP 24 M r are shown demonstrates that the model 
error dominates the propagated forecast error and is little different from the 48-hour forecast error. 


4 Concluding Remarks 

Characterization of prediction model error is important for understanding the performance of data 
assimilation systems. Advanced sequential data assimilation methods capable of calculating the 
propagated analysis error require specification of the model error. 4D-Var adjoint methods assume 
that the model error is negligible. 

A measure of the relative size of the model error is the Talagrand ratio r, the fraction of the 
forecast error due to model error. We have proposed two definitions for r. The first is simply the 
ratio of the model error variance to forecast error variance; the underlying norm is the RMS one. 
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